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1. Introduction 

To find an appropriate generalized linear model (GLM) for regression data involves choos¬ 
ing the independent variables, the link function and the variance function (McCullagh and 
Nelder (1989)). Typically many different models have to be investigated and compared us¬ 
ing individual significance tests based on the asymptotic distribution of the deviance. As 
pointed out in Gelfand and Dey (1994) and Raftery (1996) this strategy cannot be used 
for comparing nonnested models. In addition, adjustments for multiple tests as well as 
power considerations are usually ignored. A Bayesian approach can avoid these difficulties 
and therefore Raftery (1996) developed approximate Bayes factors for GLM’s based on the 
Laplace method for integrals. These approximations require only the maximum likelihood 
estimate (MLE), the deviance and the observed or expected Fisher information. Kass and 
Raftery (1995) and Han and Carlin (2001) review Bayes factors and discuss different ways 
to calculate Bayes factors. 

In this paper, we extend the approach taken by Raftery (1996) to calculate approximate 
Bayes factors for GLM’s with a parametric link function. Even though GLM’s with canonical 
links (for definition see McCullagh and Nelder (1989)), such as the logit link in binomial 
regression, guarantee maximum information and a simple interpretation of the regression 
parameters, they do not always provide the best fit available to a given data set. Link 
misspecification can lead to substantial bias in the regression parameters and the mean 
response estimates (see Czado and Santner (1992) for binomial responses). One common 
approach to guard against link misspecification in generalized linear models is to embed the 
canonical link in a wide parametric class of links S = {F(-,'0),'0 € which includes the 
canonical link as a special case when -0 = t/’o- Many such parametric link classes for binary 
regression data have been proposed in the literature. Montfort and Otten (1976), Copenhaver 
and Mielke (1977), Aranda-Ordaz (1981) , Guerrero and Johnson (1982), Morgan (1983) and 
Whittmore (1983) proposed one-parameter families, while Prentice (1976), Pregibon (1980), 
Stukel (1988) and Czado (1992) considered two-parameter families. Link functions for the 
non-binary case were studied by Pregibon (1980) and Czado (1992, 1997). 

With the multitude of link families to choose from, the Bayes factor approach is able to 
compare different link families, regardless of whether they are nested or nonnested. We will 
illustrate this ability by using the two parameter link family suggested by Czado (1997) in 
several data sets. In addition, we are able to assess jointly the choice of the link family and 
of the set of independent variables. 

In Section 2 we define and discuss GLM’s with parametric links, while in Section 3 the 
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calculation of approximate Bayes factors including the choice of priors will be discussed. 
Applications will be given in Section 4 and Section 5 will provide a summary and discussion 
of the method presented. 

2. Generalized Linear Models with Parametric Links 

The following model for regression data with response and independent variables Xi = 
(xji, • • ■ Xip) for i = 1, • • • ,n will be used: 

1. Random Component: {T*,! < i < n} are independent and have a density of the 
form 

fyAyh^iA) = (/>)], ( 2 . 1 ) 

for some specified functions and c(-). The scale parameter (p is allowed to be 

known or unknown. 

2. Systematic Component: The linear predictors 77j(;d) = /5o+/5iXjiH- \-fipXip for 1 < 

i <n influence the response 1^. Here (3 = (/^o, • • • , /5p) are unknown regression param¬ 
eters. 

3. Parametric Link Component: The linear predictors are related to the mean 

Hi of Yi by Hi = for some F(-, ■0) in S = : ■0 e T} . 

We will restrict attention to link families S which contain only strictly monotone con¬ 
tinuous functions F(-, '0). Note that in conventional GLM notation the link g is equal to the 
inverse of F. An unknown scale parameter (p in (2.1) is typically estimated by an appropriate 
moment estimator involving the Pearson Statistic (McCullagh and Nelder (1989)). For a 
fixed link parameter xp we remain in the class of GLM’s, while this is no longer true if the 
link parameter xp and the regression parameter {3 are jointly estimated by the data. Czado 
and Munk (2000) show that the joint MLE S = 0,'ip) of 6 = {^j'lp) is strongly consistent 
and efficient under regularity conditions. 

As in the case for Box-Cox transformations (Box and Cox (1964)) one has to decide 
whether to make inference conditionally on an estimated link parameter or not. In the Box- 
Cox controversy, Hinkley and Runger (1984) and Box and Cox (1982) argued for following 
a conditional approach, while Bickel and Doksum (1981) and Carroll and Ruppert (1981) 
advocated an unconditional approach. We follow the arguments given in Draper (1995) and 
are interested in assessing model uncertainty. 
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We will illustrate our approach by using the link families suggested by Czado (1997). 
They allow separate modifications of the left and/or right tail of the link function and exhibit 
low variance inflation (Taylor (1988),Taylor et al. (1996)) for the regression parameters when 
the link is estimated from the data. This is due to the fact that the parametrization is locally 
orthogonal (see Cox and Reid (1987)). In addition, they are location and scale invariant (see 
Czado (1997)). For GLM’s with parametric links they are defined as follows: 


Error 

Distribution 

Parameter 

Restriction 

Canonical 

Link 

Link Family 

S= {F(-,0) : 0 e T} 

Normal 

Binomial 

Poisson 

Gamma 

Inv. Gaussian 

11 real 
/re (0,1) 

/r > 0 
/r > 0 
/r > 0 

F(7/) = 7/ 

= exp(T;) 
t l+exp(?)) 

F{'q) = exp(7/) 
F{'q) = 7 /“^ 

F{r]) = 7/“'^ 

F(7/,0) = h(7/,0) 

.1,) _ exp{h{ri,i,)) 

V n T ) l-i-exp(h(Ti,ip)) 

T(7/,0) = exp(h(7/,0)) 
F(7/,0) = [exp(h(7/,0))]-^ 
F(7/,0) = [exp(h(7/,0))]-'5 


where h{r]^ 'll)) is one of the following functions: 


Both tails: 


Right tail: 


Left tail: 




if >0 

^1 I — 

— (~^+b otherwise 

1P2 


hr 


hi{v,'ip2) 


, (7j+l)^l-l 
Ipl 

V 

V 

(-7j+l)^2-l 

'02 


if 77 > 0 
otherwise 

if 77 > 0 
otherwise 


( 2 . 2 ) 

(2.3) 

(2.4) 


It should be noted that the parameter restriction for the mean response makes a right tail 
modification for the Poisson and a left tail modification for the Gamma and inverse Gaussian 
cases the only sensible modifications to be considered. In all other cases all modifications of 
the link function are allowed. In particular, (2.4) is a special case of (2.2) with -0 = (1,'02)- 
Similarly (2.3) is a special case of (2.2) with r/? = {'ipi, 1). As '0i increases the right tail of 
G(-,'i/’) becomes lighter, while an increasing -02 makes the left tail of G(-,'0) lighter. The 
specification (2.3) is asymmetric if ^ 1, while the specification ((2.4)) is asymmetric if 
("02 ^ !)• The both tails specification (2.2) is asymmetric if "01 ^ '02- 


3. Approximate Bayes Factors for GLM’s with Parametric Link 

We are interested in assessing the evidence for a GLM with a noncanonical link as against 
the same GLM with a canonical link using Bayes factors. For this, we denote by a GLM 
with a fixed link parameter 0 for a given set of independent variables, while denotes 
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the same GLM using the canonical link. We denote the regression parameter corresponding 
to model by to indicate that the regression parameters are on different scales for 
different 'ip's. We are interested in the Bayes factor for model against model given 
the data Y = (Yi, • • • , 1^), which is defined as the ratio of posterior to prior odds, namely 


. pr{Y\M^) 
pr(Y\M,)' 

the ratio of the integrated likelihoods. In equation (3.1), 


(3.1) 


pr(Y\M^)= f pr{Y\M^„0^)p{^^\M^)d0^, (3.2) 

where is the corresponding regression parameter in Model and is its prior 

density in model M^. Note that corresponds to with '0 = 1- 

The Bayes factor is a summary of the evidence for against provided by the 
data. Sometimes it is useful to consider 21 ogB. 0 , which is on the same scale as the familiar 
deviance and likelihood ratio test statistics. In this paper we follow Raftery (1996) by using 
the rounded scale given in Table 1 of Raftery (1996) for interpreting or 21 ogR. 0 . 

This approach allows us to compare different parametric link families as follows. Let Mq 
denote a GLM using a link family indexed by the link parameter 6 and construct Bq in a 
similar fashion as B^. The quantity ^ then provides a summary of the evidence for model 
against model Mq given the data and the same set of independent variables. In a similar 
way we can construct comparisons of models with different sets of independent variables and 
link parameters. 

For the link families given in Table 1 it is also of interest to assess whether a right 
tail, left tail or a both tail modification is needed. For this we can compare B^^{B^^) and 
■®' 0 =(. 0 i ^ 2 ) individual link parameter values or construct overall Bayes factors for each 
tail modihcation, given by 


Both Tails : 

Bq = 

j 


(3.3) 

Right Tail: 

Bj. = 


(3.4) 

Left Tail: 


[ B^^pr{'ip2\M^^)d'ip2, 

(3.6) 


where pr(' 0 i|M^J and pr{'il) 2 \M^^) denote the corresponding prior densi¬ 

ties for 'i/’,'*/’! and ' 02 , respectively. If the link parameter values are not chosen in advance. 
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but instead are estimated, and B^ will tend to overstate the evidence for a modifi¬ 

cation. The average Bayes factors B^., Bi and Bi^ are preferable in this case, because they take 
into account the fact that the link parameters are unknown and thus take link uncertainty 
into account. For example, the ratio ^ will compare a both tails modification to a right tail 
one. In a similar fashion we can assess the evidence for one link family against another one 
given the same or different set of independent variables. 

To complete the specification of these Bayes factors, we have to select appropriate prior 
distributions for the regression parameters given a model with a specified link parameter as 
well as the prior distribution to be used for the link parameter to construct overall Bayes 
factors for a GLM with a specified link family. 

For the prior distribution of the regression parameters (3^ in the model we use 
the reference proper prior distributions suggested by Raftery (1996) for GLM’s, since for 
fixed values of the link parameter -0 we remain in the class of ordinary GLM’s. These prior 
distributions assume little prior information. They are based on adjusted dependent variables 
to mimic the behavior for ordinary linear regression models. For a (p -|- l)-dimensional (3^ 
including an intercept, we use the prior 


^p+i{yip-iQipUQ^')i 


(3.6) 


where S) denotes a p-dimensional normal distribution with mean vector /r and co- 

variance matrix S. To specify the quantities in (3.6), the adjusted dependent variable 
+ iVi ~ weights wf (McCullagh and Nelder (1989), p.40) has 

to be considered. Here fif denotes the MLE of the ith mean response in the GLM with link 
parameter and is the inverse of Dehne the weighted summary statistics: 




—Ip 


Z^i=i 


E n 


^ and st 


Er=i <( 




E n 

i=iw 


ip 


E n 'ip 

fel^a„d4=, 

Ei. ^ \ 


Efai 

T.U-'4 


, j = I-- - 


(3.7) 

(3.8) 


Then the prior mean is specified as v'^ = ■ ■ ■ ,p)', U denotes a diagonal matrix with 
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diagonal entries given by (1, • • • , a^) and 


Q,p = 4 


1 

0 

0 


xt 

1 

0 


0 0 


_ -tp 

^2 

*2 


7 

*2 


Sp 


0 

0 


Sp 


It remains to specify a^. The arguments of Raftery (1996) and subsequent experience using 
Bayes factors for GLM’s (e.g. Viallefont et al. (1998)) suggests using the value = 1. 

For the prior distribution of the link parameter ip we use a normal prior centered at 
the link parameter value corresponding to the canonical link. In particular, for the link 
family with specification (2.3) or (2.4) we use a normal prior with mean 1 and standard 
deviation while for the bivariate specification (2.2) involving -ip = {ipi,ip 2 ), we assume 
independence of the components and proceed as in the univariate specifications. Numerical 
experience with this particular link family suggests that = 2 is a reasonable choice. 

To approximate the Bayes factors of (3.1) we use the Laplace approximation for Bayes 
factors for GLM’s given in Raftery (1996), namely 


21ogR^ ^ xj + (-^^ --^o), (3.9) 

where x^ = dev{Mf.) — dev{M^). Here dev{M) denotes the deviance of model M. Let 
denote the observed or expected Fisher information matrix at the MLE 0^ in the model 
M^. Then in equation (3.9) is given by 

Ff^p lo§ 1^4 T G^|, (3.10) 

where G^p = {Q,pUQ'^)~^ is the inverse of the prior variance in (3.6) and is defined as 

Cf = — H^{21 — F^Htp)Gtp}^ where = {F^p + GpP) 


Finally, Eq is equal to where ip is taken to be the value corresponding to the canonical 
link. Equation (3.10) corresponds to equation (9) in Raftery (1996). 

To calaculate approximations to the overall Bayes factors specified in (3.3)-(3.5) we use 
the above approximation and numerically integrate out ip using the prior specifications for 

Ip. 
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4. Applications 

4.1 Beetle Mortality 

Bliss (1935) recorded the number of insects dead after five hours’ exposure to gaseous 
carbon disulphide at various concentrations and the data are presented in Table 2. This is 
a well known data set and has been often used to show the usefulness of a different link 
function other than the logistic one. In particular, the residual deviance for a logistic model 
with a centered log dose covariate is 11.23 with 6 degrees of freedom, suggesting a lack of fit. 

[Table 1 about here.] 

[Figure 1 about here.] 

Figure 1 gives the deviance profiles and contours, when the link families (2.2)-(2.4) are 
used for binomial regression. They show clearly that a tail modification in this data set 
is useful and improves the fit. We will now use Bayes factors to decide which specific 
tail modification is needed. We use the prior specification (3.6) with = 1 and normal 
independent priors for r/? with prior standard deviation = 2. Figure 2 shows the Bayes 
factors as a function of if) and in Table 3 the overall Bayes factors for each tail modification 
family are given together with minimal deviances and maximal individual Bayes factors B^. 

[Figure 2 about here.] 

[Table 2 about here.] 

From this we conclude that the Bayes factors clearly favors a right tail modification over 
a left or both tail modification. While the likelihood ratio test can be used to show that 
the reduction in deviance achieved by using a both tail modification over a right/or left 
tail modification is insignificant, we cannot compare right and left tail modifications, since 
they are not nested models. Graphically, we see that in Figure 1, the lines determining the 
point (1,1) (corresponding to logistic link) intersect the confidence regions suggesting that 
single tail modifications are sufficient. We can also see from Table 3 that the maximal Bayes 
factors, corresponding to estimated values of V’l and " 02 , overestimate the evidence for a 
modification quite substantially. 

This data set has also been considered by Collett (1991) p. 108-112, who allowed for the 
inclusion of a quadratic term on the original CS 2 scale in a logistic model. This yields a 
residual deviance of 3.08 with 5 degrees of freedom. We can now use Bayes factors to decide 
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if the right tail link fit is preferable over the inclusion of a quadratic term on the original 
CS 2 scale. Note these models are again nonnested. The corresponding Bayes factor is given 
by 


■BV’i = 1-99 


Pr I =l,a:=log(C52)) 
,csl )) 


1 

= 116.66 X .0011 = .1280 =- 

7.80 


This shows that a logistic model using a quadratic term on the original scale is favored over a 
right tail link family. Collett (1991) p. 140 noted that a complementary log-log model for the 
link parameter fits the data as well as the logistic model using a quadratic term. He argued 
that the complementary log-log model would be preferable since it has fewer parameters, 
but this ignores the uncertainty in the choice of link function. 

4.2 Rotifer Suspension 

The following example is taken from Collett (1991). It involves the number of rotifers 
falling out of suspension for two species, called polyartha major and keratella cochlearis for 
different fluid densities; the data are given in Table 6.10 in Collett (1991), p. 217. For the 
binary regression models considered below species were coded by 1 for polyartha major and 
0 for keratella cochlearis and a centered covariate for density x 100 was used. 

In this data set we have in addition to the link choice the problem of deciding whether or 
not to include an interaction term between species and density. A logistic regression analysis 
gives a residual deviance of 434.25 on 37 degrees of freedom for a model including no interac¬ 
tion term, while a model including an interaction term yields a residual deviance of 434.01 on 
36 degrees of freedom. This indicates a severe lack of fit and shows that an interaction term 
is not needed if only a logistic link is allowed. Therefore, this raises the question whether 
an interaction term would improve the fit when links other than the logistic are considered. 
So we consider six model classes corresponding to the three possible tail modifications and 
the two choices for set of covariates. Figures 3 and 4 present the corresponding deviance 
profiles and contours. This suggests that the inclusion of an interaction term decreases the 
residual deviance substantially. Further, a both tail modification substantially improves the 
fit, which can be seen since the lines determining the logistic link (r/? = (1,1)) do not inter¬ 
sect with the conhdence regions. This was also noted by Czado (1994b) who conducted a 
fully Bayesian analysis of this data set using Markov Chain Monte Carlo methods for joint 
inference on regression parameters and link parameters. 


[Figure 3 about here. 
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[Figure 4 about here. 


Figures 5 and 6 present the approximate Bayes factors as a function of 'ip, and in 
Table 4 the overall Bayes factors for each tail modification family and covariate set are given 
together with minimal deviances and maximal individual Bayes factors B^\ 

[Figure 5 about here.] 

[Figure 6 about here.] 

[Table 3 about here.] 

The overall log Bayes factors for the both tail modification is the largest for the model 
including an interaction term. Comparing whether the inclusion of an interaction term is 
warranted for a both tail modification we can see that the log Bayes factor for an interac¬ 
tion is 76.84 — 67.24 = 9.6, which corresponds to strong evidence for an interaction. Note 
that a conventional GLM analysis such as that of Collett (1991) would miss this important 
interaction. 

5. Discussion 

We have presented a Bayesian approach to model selection in GLM’s with parametric link 
using Bayes factors to account for structural model uncertainty (see Draper (1995)) such 
as the choice of link in a GLM. This involves a continous model expansion over ordinary 
GLM’s when a particular link family was considered as well as a discrete model expansion 
when different link families were compared. In addition we were able to jointly assess the 
choice of link together with the choice of the set of independent parameters to include in 
the model. This involves the comparison of nonnested models, which cannot be carried out 
using classical model selection strategies based on significance tests. 

We used reference proper priors for the regression parameters of a GLM with a fixed link 
function as suggested by Raftery (1996). These priors vary with the link parameter, reflecting 
the fact that the regression parameters are on different scales for different link functions. This 
reference proper prior avoids the problem of Bartlett’s (Bartlett (1957)) or Lindley’s (Bindley 
(1957)) paradox and thus in this case Bayes factors have the advantage over posterior Bayes 
factors (Aitkin (1991)), p-values or the AIC criterion that they correctly identify the correct 
model in large samples, while the other criteria do not (Schwartz (1978)). Finally, the Bayes 
factors were approximated using the Laplace approximations given in Raftery (1996). 
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A complete Bayesian analysis of a GLM’s with a parametric link is computer intensive, 
since the calculation of posterior distributions involve Markov Chain Monte Carlo methods 
(Czado (1994a)). The methods presented in this paper can be used for a final analysis, or 
could be used to screen for plausible models, which could then be used as starting points 
for a complete Bayesian analysis. Note our methods for calculating these Bayes factors only 
requires software which is able to fit a GLM with an arbitrary link. In particular, joint 
maximization over regression parameters and link parameters to determine the maximum 
likelihood estimator is not needed. Here, calculations were conducted in S-Plus using the 
glmO function together with integration functions in one or two dimensions. 

It should be noted that Bayes factors give summary statistics for the fit of a particular 
model or model class. For inference about model independent quantities such as the log odds 
ratio of a treatment effect or the mean response at a particular value of the independent 
variables techniques such as Bayesian model averaging (see for example Hoeting et al. (1999)) 
or MCMC methods to compute posterior quantities are required. This also allows a Bayesian 
alternative to the quantifications of change to quantities of interest when changing from a 
GLM with canonical link to one with noncanonical link. This was the goal of a paper by 
Czado and Munk (2000). 
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Figure 1. Deviance Profiles and Deviance Contours for the Beetle Mortality Data 
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Figure 2. Approximate Bayes Factor Profiles and Contours for the Beetle Mortality Data 
with Gp = 1 
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Figure 3. Deviance Profiles and Deviance Contours using no Interaction Term for the 
Rotifer Suspension Data 
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Figure 4. Deviance Profiles and Deviance Contours using an Interaction Term for the 
Rotifer Suspension Data 
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Figure 5. Approximate Bayes Factors Profiles and Contours using no Interaction Term for 
the Rotifer Suspension Data with ap = 1 
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Figure 6. Approximate Bayes Factors Profiles and Contours using an Interaction Term for 
the Rotifer Suspension Data with ap = 1 
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Table 1 

Beetle Mortality Data 


Yi 

Number killed 

rii 

Number of Insects 

Dose 

logio CS2mgl~^ 

6 

59 

1.6907 

13 

60 

1.7242 

18 

62 

1.7552 

28 

56 

1.7842 

52 

63 

1.8113 

53 

59 

1.8369 

61 

62 

1.8610 

60 

60 

1.8839 
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Table 2 

Approximate Bayes Factors and Deviances for the Beetle Mortality Data (up = l,a^ = 


Model 

Minimal 

Deviance 


df 

Maximal 
Bayes Factor 

(V’l,V’2) 

Overall 
Bayes Factor 

Right 

3.96 

(1.92,-) 

5 

116.66 

(1.99,-) 

20.61 

Left 

3.04 

(-,.16) 

5 

46.41 

(-,•21) 

5.07 

Both 

2.81 

(1.2,.3) 

4 

123.89 

(1.8,.8) 

5.38 
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Table 3 

Approximate Log Bayes Factors and Deviances for the Rotifer Suspension Data 

(Op — 1 , ( 7 ^ — 2,) 


Model 

Minimal 

Deviance 


df 

Maximal 
Log Bayes 
Factor 

(V’l,V’2) 

Overall 
Log Bayes 
Factor 

Right Tail 

no interaction 

422.8 

(1.3,-) 

36 

5.8 

(1.3,-) 

2.91 

interaction 

396.9 

o 

00 

35 

19.1 

(1.3,-) 

16.74 

Left Tail 

no interaction 

353.1 

(-,•4) 

36 

39.6 

(-,.07) 

36.03 

interaction 

287.7 

(-,.07) 

35 

71.3 

(-,.07) 

67.87 

Both Tails 

no interaction 

273.7 

(.l,-.2) 

35 

74.6 

(.2,-.l) 

67.24 

interaction 

255.0 

(.35,-.!) 

34 

83.6 

(.35,-.!) 

76.84 
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